The DIG (Digitalis Investigation Group) Trial was a randomized, double-blind, multicenter trial with more than 300 centers in the United States and Canada participating. The purpose of the trial was to examine the safety and efficacy of Digoxin in treating patients with congestive heart failure in sinus rhythm. Digitalis was introduced clinically more than 200 years ago and has since become a commonly prescribed medication for the treatment of heart failure; however, there was considerable uncertainty surrounding its safety and efficacy. Small trials indicated that Digoxin alleviated some of the symptoms of heart failure, prolonged exercise tolerance, and generally improved the quality of patients’ lives. Unfortunately, these trials were generally small and although they did focus on the effect of treatment on patients’ relief from heart failure symptoms and quality of life, they failed to address the effect of treatment on cardiovascular outcomes. Questions about the safety of Digoxin were also a concern. Digoxin toxicity is uncommon in small trials with careful surveillance, however, the long-term effects of therapeutic levels of Digoxin were less clear.
The DIG dataset consists of baseline and outcome data from the main DIG trial. In the main trial, heart failure patients meeting the eligibility criterion and whose ejection fraction was 45% or less were randomized to receive either a placebo or digoxin. Outcomes assessed in the trial included: cardiovascular mortality, hospitalization or death from worsening heart failure, hospitalization due to other cardiovascular causes and hospitalization due to non-cardiovascular causes.
The DIG dataset was obtained for the purpose of this assignment and is enclosed with this assignment. The codebook associated with the variables is also enclosed with your assignment.
In order to create an anonymous dataset that protects patient confidentiality, most variables have been permuted over the set of patients within treatment group. Therefore, it would be inappropriate to use this dataset for other research or publication purposes.
Change my name and id to yours in YAML above and complete the tasks below by inserting the required code in the R chunks provided under each task. Then knit the document to generate a html document with your solutions.
Read in the csv file DIG.csv provided in your
assignment and call it dig.df.
Select the following variables from the data:
ID, TRTMT, AGE, SEX, BMI, KLEVEL, CREAT, DIABP, SYSBP and
HYPERTEN, CVD, WHF, DIG, HOSP, HOSPDAYS, DEATH, DEATHDAY.
And convert each column to a datatype that is most relevant. i.e. characters to character, numbers to numeric, etc.
if (!require("janitor")) install.packages("janitor")
## Loading required package: janitor
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
if (!require("tidyverse")) install.packages("tidyverse")
## Loading required package: tidyverse
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
if (!require("lubridate")) install.packages("lubridate")
if (!require("plotly")) install.packages("plotly")
## Loading required package: plotly
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
if (!require("gganimate")) install.packages("gganimate")
## Loading required package: gganimate
if (!require("gifski")) install.packages("gifski")
## Loading required package: gifski
if (!require("ggvis")) install.packages("ggvis")
## Loading required package: ggvis
##
## Attaching package: 'ggvis'
##
## The following object is masked from 'package:gganimate':
##
## view_static
##
## The following objects are masked from 'package:plotly':
##
## add_data, hide_legend
##
## The following object is masked from 'package:ggplot2':
##
## resolution
if (!require("gghighlight")) install.packages("gghighlight")
## Loading required package: gghighlight
library(tidyverse)
library(janitor)
library(lubridate)
library(table1)
##
## Attaching package: 'table1'
##
## The following objects are masked from 'package:base':
##
## units, units<-
library(plotly)
#library(gganimate)
library(gifski)
library(ggvis)
library(gghighlight)
library(survminer)
## Loading required package: ggpubr
library(survival)
##
## Attaching package: 'survival'
##
## The following object is masked from 'package:survminer':
##
## myeloma
#importing data
#read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv")
## Insert your code here
dig.df <- read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv") %>%
janitor::clean_names()
#dig.df
## Insert your code here
#cleaning the data and selecting the desired information to work on
dig_new.df <- dig.df %>%
mutate(
trtmt = factor(trtmt, levels = c(0,1), labels = c("Placebo", "Treatment")),
sex = factor(sex, levels = c(1,2), labels = c("Males", "Females")),
#hyperten = factor(hyperten, levels = c(0,1)),
hyperten = factor(hyperten, levels = c(0,1), labels = c("No","Yes")),
cvd = factor(cvd, levels = c(0,1), labels = c("No","Yes")),
whf = factor(whf, levels = c(0,1), labels = c("No","Yes")),
dig = factor(dig, levels = c(0,1), labels = c("No","Yes")),
hosp = factor(hosp, levels = c(0,1), labels = c("No","Yes")),
death = factor(death, levels = c(0,1), labels = c("Alive","Death"))
) %>%
select(id, trtmt, age, sex, bmi, klevel, creat, diabp, sysbp, hyperten, cvd, whf, dig, hosp, hospdays, death, deathday)
# show tidy new data frame
#dig_new.df
Use both appropriate summary statistics and visualisation to answer the questions below.
Note: you must use appropriate labeling, captions and legends for your graphics and tables.
Note: you must use alternative colouring than the default coloring for your plots.
Note: you must interpret all your summaries and figures in order to answer the question that was asked.
Note: where appropriate use interactive graphics.
Hint: package table1 could be quite helpful in obtaining summary statistics. For more info visit here.
Summarise the number (and proportion) of patients hired within each treatment group.
## Insert your code here
# calculating the proportion by grouping and summarising
dig_new.df %>%
group_by(trtmt,sex) %>%
summarise( count = n(), proportion = (n() / nrow(dig.df))*100, .groups = "drop")
## # A tibble: 4 × 4
## trtmt sex count proportion
## <fct> <fct> <int> <dbl>
## 1 Placebo Males 2639 38.8
## 2 Placebo Females 764 11.2
## 3 Treatment Males 2642 38.9
## 4 Treatment Females 755 11.1
# summarising data using table1
label(dig_new.df$trtmt) <- "Treatment"
table1(~trtmt| sex , data = dig_new.df, caption = "Number of Patients hired within each Treatment Group")
| Males (N=5281) |
Females (N=1519) |
Overall (N=6800) |
|
|---|---|---|---|
| Treatment | |||
| Placebo | 2639 (50.0%) | 764 (50.3%) | 3403 (50.0%) |
| Treatment | 2642 (50.0%) | 755 (49.7%) | 3397 (50.0%) |
Interpretation ….. In the table, we see a uniform distribution with 50.0% males in the placebo group and 50.0% in treatment group. Similarly, in females 50.3% were in the placebo group and 49.7% were in the treatment group. Overall, there is no significant difference in treatment allocation based on sex. We have more males than females in the given sample size, however treatment within each sex is evenly distributed.
Assess if there is any significant differences in base-line characteristics (e.g. Age, Sex, BMI, …) between the patients assigned to digoxin and patients assigned to placebo and comment on any unusual pattern you see:
## Insert your code here
# tabel labels
#dig_new.df
label(dig_new.df$sex) <- "Sex"
label(dig_new.df$age) <- "Age"
label(dig_new.df$bmi) <- "BMI"
label(dig_new.df$klevel)<-"KLEVEL"
label(dig_new.df$creat)<- "CREAT"
label(dig_new.df$diabp)<-"DIABP"
label(dig_new.df$sysbp) <- "SYSBP"
label(dig_new.df$hyperten) <- "HYPERTEN"
label(dig_new.df$cvd) <- "CVD"
label(dig_new.df$whf) <- "WHF"
label(dig_new.df$dig) <- "DIG"
label(dig_new.df$hosp) <- "HOSP"
label(dig_new.df$death) <- "DEATH"
# creating a table with baseline character againt treatment using table1
table1(~ age + sex + bmi + klevel + creat + diabp + sysbp + hyperten + cvd + whf + dig + hosp + death | trtmt, data = dig_new.df, caption = "Summary of Base-Line Charecterstics for Digoxin and Placebo")
| Placebo (N=3403) |
Treatment (N=3397) |
Overall (N=6800) |
|
|---|---|---|---|
| Age | |||
| Mean (SD) | 63.5 (10.8) | 63.4 (11.0) | 63.5 (10.9) |
| Median [Min, Max] | 65.0 [22.0, 90.0] | 64.0 [21.0, 90.0] | 65.0 [21.0, 90.0] |
| Sex | |||
| Males | 2639 (77.5%) | 2642 (77.8%) | 5281 (77.7%) |
| Females | 764 (22.5%) | 755 (22.2%) | 1519 (22.3%) |
| BMI | |||
| Mean (SD) | 27.2 (5.19) | 27.0 (5.19) | 27.1 (5.19) |
| Median [Min, Max] | 26.6 [14.4, 62.7] | 26.4 [15.2, 58.3] | 26.5 [14.4, 62.7] |
| Missing | 1 (0.0%) | 0 (0%) | 1 (0.0%) |
| KLEVEL | |||
| Mean (SD) | 4.46 (7.87) | 4.33 (0.511) | 4.40 (5.57) |
| Median [Min, Max] | 4.30 [0, 434] | 4.30 [0, 6.30] | 4.30 [0, 434] |
| Missing | 410 (12.0%) | 391 (11.5%) | 801 (11.8%) |
| CREAT | |||
| Mean (SD) | 1.29 (0.372) | 1.28 (0.366) | 1.29 (0.369) |
| Median [Min, Max] | 1.21 [0.100, 3.05] | 1.20 [0.500, 3.76] | 1.20 [0.100, 3.76] |
| DIABP | |||
| Mean (SD) | 74.9 (11.1) | 74.9 (11.5) | 74.9 (11.3) |
| Median [Min, Max] | 75.0 [38.0, 140] | 75.0 [25.0, 184] | 75.0 [25.0, 184] |
| Missing | 3 (0.1%) | 2 (0.1%) | 5 (0.1%) |
| SYSBP | |||
| Mean (SD) | 126 (19.9) | 126 (19.9) | 126 (19.9) |
| Median [Min, Max] | 124 [74.0, 202] | 122 [78.0, 220] | 123 [74.0, 220] |
| Missing | 2 (0.1%) | 1 (0.0%) | 3 (0.0%) |
| HYPERTEN | |||
| No | 1846 (54.2%) | 1869 (55.0%) | 3715 (54.6%) |
| Yes | 1557 (45.8%) | 1527 (45.0%) | 3084 (45.4%) |
| Missing | 0 (0%) | 1 (0.0%) | 1 (0.0%) |
| CVD | |||
| No | 1553 (45.6%) | 1703 (50.1%) | 3256 (47.9%) |
| Yes | 1850 (54.4%) | 1694 (49.9%) | 3544 (52.1%) |
| WHF | |||
| No | 2223 (65.3%) | 2487 (73.2%) | 4710 (69.3%) |
| Yes | 1180 (34.7%) | 910 (26.8%) | 2090 (30.7%) |
| DIG | |||
| No | 3372 (99.1%) | 3330 (98.0%) | 6702 (98.6%) |
| Yes | 31 (0.9%) | 67 (2.0%) | 98 (1.4%) |
| HOSP | |||
| No | 1121 (32.9%) | 1213 (35.7%) | 2334 (34.3%) |
| Yes | 2282 (67.1%) | 2184 (64.3%) | 4466 (65.7%) |
| DEATH | |||
| Alive | 2209 (64.9%) | 2216 (65.2%) | 4425 (65.1%) |
| Death | 1194 (35.1%) | 1181 (34.8%) | 2375 (34.9%) |
# importing the data and selecting the desired characteristics
# dig_plot.df <- read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv")
# dig_plot.df %>%
# select(ID, TRTMT, AGE, SEX, BMI, KLEVEL, CREAT, DIABP, SYSBP)
# removing missing values
#dig_plot.df <- na.omit(dig_plot.df)
#dig_plot.df
#parallel coordinate plot
# graph <- dig_plot.df %>%
# plot_ly(type = 'parcoords',
# line = list(color = dig_plot.df$TRTMT,
# colorscale = list(c(0, 'purple'), c(1, 'orange')),
# showscale = T),
# dimensions = list(
# list(tickvals = c(0, 1), ticktext = c('Placebo', 'Treatment'), label = "TRTMT", values = dig_plot.df$TRTMT),
# list(range = c(min(dig_plot.df$AGE), max(dig_plot.df$AGE)), label = "AGE", values = dig_plot.df$AGE),
# list(range = c(min(dig_plot.df$BMI), max(dig_plot.df$BMI)), label = "BMI", values = dig_plot.df$BMI),
# list(range = c(min(dig_plot.df$Klevel), max(dig_plot.df$KLEVEL)),label = "KLEVEL", values = dig_plot.df$KLEVEL),
# list(range = c(min(dig_plot.df$CREAT), max(dig_plot.df$CREAT)), label = "CREAT", values = dig_plot.df$CREAT),
# list(range = c(min(dig_plot.df$DIABP), max(dig_plot.df$DIABP)), label = "DIABP", values = dig_plot.df$DIABP),
# list(range = c(min(dig_plot.df$SYSBP), max(dig_plot.df$SYSBP)), label = "SYSBP", values = dig_plot.df$SYSBP),
# list(tickvals = c(1, 2), ticktext = c('Male', 'Female'), label = "SEX", values = dig_plot.df$SEX)
# ))%>%
# layout(margin = list(t = 100), ##bottom margin in pixels
# annotations =
# list(x = .5, y = 1.22, #position of text adjust as needed
# text = "Baseline Characteristics for Treatment Group",
# showarrow = F,
# font=list(size=15, color= "black")))
#
#
# # Show the plot
# graph
Interpretation ….. The Digoxin and placebo groups are almost balanced for most baseline characteristics like age, sex, BMI, bloop pressure and creatinine level. In case of Cardiovascular disease, worsening heart failure and prior hospitalization are slightly more prevalent in placebo group than treatment. Majority if the people are male with moderate BMI. Overall, both groups are comparable, such that any variation that may arise will due to treatment effect rather than baseline issues.
Assess if the overall mortality was affected by the treatment.
## Insert your code here
# grouping, summarizing and mutating the data
a <- dig_new.df%>%
group_by(trtmt,death)%>%
summarise(count = n(), .groups = "drop")%>%
group_by(trtmt)%>% # to ensure correct denominator
mutate(perctage = count / sum(count)*100 )
a
## # A tibble: 4 × 4
## # Groups: trtmt [2]
## trtmt death count perctage
## <fct> <fct> <int> <dbl>
## 1 Placebo Alive 2209 64.9
## 2 Placebo Death 1194 35.1
## 3 Treatment Alive 2216 65.2
## 4 Treatment Death 1181 34.8
#table1(~death|trtmt, data = dig_new.df, caption = 'Overall Effect of Treatment on Mortality')
# ggplot
g <-ggplot(data = a,
mapping = aes(x = death, y = perctage, fill = death)) +
geom_bar(stat = "identity", alpha = 0.6) +
labs(
title = "Barplot of Overall Effect of Treatment on Mortality",
fill = "Mortality",
caption = "Source: DIG-Digitalis Investigation Group",
x = "Death",
y = "Percentage %") +
scale_fill_manual(values = c("Alive" = "cyan" , "Death" = "lightgreen" ))+
theme_classic()
# show the plot
ggplotly(g)
Interpretation ….. Patients in both Placebo or Treatment group show almost identical mortality and survival rates.
Assess if the Cardiovascular disease (CVD) is associated
with the mortality overall and also within each treatment group.
## Insert your code here
# table showing overall cvd association with mortality and treatment groups using table1
table1(~cvd|trtmt + death, data = dig_new.df)
Placebo |
Treatment |
Overall |
||||
|---|---|---|---|---|---|---|
| Alive (N=2209) |
Death (N=1194) |
Alive (N=2216) |
Death (N=1181) |
Alive (N=4425) |
Death (N=2375) |
|
| CVD | ||||||
| No | 1150 (52.1%) | 403 (33.8%) | 1246 (56.2%) | 457 (38.7%) | 2396 (54.1%) | 860 (36.2%) |
| Yes | 1059 (47.9%) | 791 (66.2%) | 970 (43.8%) | 724 (61.3%) | 2029 (45.9%) | 1515 (63.8%) |
#ggplot
g0 <- ggplot(data = dig_new.df,
mapping = aes(x = cvd, fill = death)) +
facet_wrap(~trtmt) +
geom_bar(position="fill" ) +
scale_y_continuous(labels = scales::percent) +
labs(title = "Barplot of Overall Effect of Treatment on Mortality and CVD",
x ="CVD Status", y = "Percentage",
fill = "Mortality") +
scale_fill_manual(values = c("Alive" = "skyblue" , "Death" = "orange" ))+
theme_classic()
#show plot
ggplotly(g0)
Interpretation ….. Survival rate is higher in people without cvd(54.) in comparision to those with cvd (45.9%). Participats with cvd (63.8%) show a higher rate of mortatity than those without cvd (36,2%).
Assess if the hospitalizations was affected by the treatment.
## Insert your code here
# table for relation between hospitalization and treatment
label(dig_new.df$hosp) <- "Hospitalization"
table1(~hosp|trtmt,dig_new.df,caption = "Hopitalizations affected by treatment")
| Placebo (N=3403) |
Treatment (N=3397) |
Overall (N=6800) |
|
|---|---|---|---|
| Hospitalization | |||
| No | 1121 (32.9%) | 1213 (35.7%) | 2334 (34.3%) |
| Yes | 2282 (67.1%) | 2184 (64.3%) | 4466 (65.7%) |
# ggplot
g1<- ggplot(data = dig_new.df,
mapping = aes(x = trtmt, fill = hosp)) +
geom_bar(position = "fill", alpha = 0.8) +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Overall Effect of Treatment on Hospitalizations",
caption = "Source: DIG-Digitalis Investigation Group",
x ="Treatment",
y = "Percentage",
fill = "Hospitalized")+
scale_fill_manual(values = c("Yes" = "maroon" , "No" = "lightgreen" )) +
theme_classic()
#show plot
ggplotly(g1)
Interpretation ….. Most people (Placebo: 67.1% , Treatment: 64.3%) have been hospitalized. Only a small percentage of those receiving treatment were hospitalized less in compared to placebo.
Assess if the Worsening heart failure (WHF) is
associated with the hospitalizations overall and also within each
treatment group:
## Insert your code here
# grouping, summaring and mutating data frame
label(dig_new.df$whf) <- "Worsening heart failure"
m <-dig_new.df%>%
group_by(whf,hosp,trtmt)%>%
summarise(count = n(), .groups = "drop")%>%
group_by(trtmt,hosp)%>%
mutate(perctage = count / sum(count)*100 )
m
## # A tibble: 6 × 5
## # Groups: trtmt, hosp [4]
## whf hosp trtmt count perctage
## <fct> <fct> <fct> <int> <dbl>
## 1 No No Placebo 1121 100
## 2 No No Treatment 1213 100
## 3 No Yes Placebo 1102 48.3
## 4 No Yes Treatment 1274 58.3
## 5 Yes Yes Placebo 1180 51.7
## 6 Yes Yes Treatment 910 41.7
# summarizing using table1
table1(~whf|trtmt +hosp, dig_new.df, caption = "Effect of Worsening Heart Failure on Hospitalization in Patients")
Placebo |
Treatment |
Overall |
||||
|---|---|---|---|---|---|---|
| No (N=1121) |
Yes (N=2282) |
No (N=1213) |
Yes (N=2184) |
No (N=2334) |
Yes (N=4466) |
|
| Worsening heart failure | ||||||
| No | 1121 (100%) | 1102 (48.3%) | 1213 (100%) | 1274 (58.3%) | 2334 (100%) | 2376 (53.2%) |
| Yes | 0 (0%) | 1180 (51.7%) | 0 (0%) | 910 (41.7%) | 0 (0%) | 2090 (46.8%) |
#ggplot
g2 <- ggplot(data = dig_new.df,
mapping = aes(x = whf, fill = hosp)) +
facet_wrap(~trtmt) +
geom_bar(position="fill" ) +
scale_y_continuous(labels = scales::percent) +
labs(title = "Barplot of Overall Effect of WHF on Hospitalization and Treatment",
x ="WHF Status", y = "Percentage",
fill = "Hospitalised") +
scale_fill_manual(values = c("Yes" = "orchid" , "No" = "coral" ))+
theme_classic()
#show plot
ggplotly(g2)
Interpretation ….. Participants who received treatment were hospitalized more frequently (58.3%), than others (48.2%). Those with worsening heart failure getting digoxin(41.6%) were less hospitalized.
Create a new variable Month by dividing the variable
DEATHDAY to 30 and round it to the nearest whole
number.
## Insert your code here
# adding a new column to existing data frame using mutate
dig_new1.df <- dig_new.df %>%
mutate(Month = round(dig.df$deathday/30))
#show data frame
#dig_new1.df
Summarise the variable Month created above.
## Insert your code here
# calculating minimum, maximum, mean, median, standard deviation using summaries function
h <- dig_new1.df %>%
summarise(
Minimum = min(Month),
Maximum = max(Month),
Mean = mean(Month),
Median = median(Month),
Std_Deviation = sd(Month)
)
#show tabel
h
## Minimum Maximum Mean Median Std_Deviation
## 1 0 59 35.44868 38 15.17628
# calculating minimum, maximum, mean, median, standard deviation using using table1
table1(~Month, data = dig_new1.df, caption = "Months upto Last Follow Up or Death")
| Overall (N=6800) |
|
|---|---|
| Month | |
| Mean (SD) | 35.4 (15.2) |
| Median [Min, Max] | 38.0 [0, 59.0] |
Interpretation ….. On average patients followed up till approximatly 3 years (35.4 months),with some just under 5 years(Max:59 months).
Summarise the risk of mortality within each month.
HINT: you may want to use survfit
function in Survival package to extract required
mortality rate within each month. For an example see here
## Insert your code here
# loadin the following packages
library(knitr)
library(survival)
library(ggsurvfit)
# importing the data
y <- read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv")
#selecting the desired data and adding Month column by using mutate
y<-y%>%
select(ID, TRTMT, AGE, SEX, BMI, KLEVEL, CREAT, DIABP, SYSBP, HYPERTEN, CVD, WHF, DIG, HOSP, HOSPDAYS, DEATH, DEATHDAY)%>%
janitor::clean_names()%>%
mutate(month = round(deathday/30))
#using survfit function to estimated survival probabilities
f <- Surv(time = y$month, event = y$death)
f1 <- survfit(f ~1, data = y)
# graph for survfit
ggsurvfit(f1, linewidth = 1) +
labs(x = "Months", y = "Cumulative Incidence")+
add_risktable()+
scale_ggsurvfit()
Interpretation ….. The cumulative incidence increases showing a positive relationship and survival decreases over time showing a negative relationship. #### Question 10:
Summarise the risk of mortality within each month and for each treatment group.
## Insert your code here
y <- read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv")
light<-y%>%
select(ID, TRTMT, AGE, SEX, BMI, KLEVEL, CREAT, DIABP, SYSBP, HYPERTEN, CVD, WHF, DIG, HOSP, HOSPDAYS, DEATH, DEATHDAY)%>%
janitor::clean_names()%>%
mutate(Month = round(deathday/30))
fa <- survfit(Surv(time = light$Month, event = light$death) ~trtmt, data = light)
fa
## Call: survfit(formula = Surv(time = light$Month, event = light$death) ~
## trtmt, data = light)
##
## n events median 0.95LCL 0.95UCL
## trtmt=0 3403 1194 NA NA NA
## trtmt=1 3397 1181 NA NA NA
ggsurvplot(fa, data =light ,
linewidth = 1,
palette = c("black","orange"),
censor.shape = '|', censor.size = 4,
conf.int = T,
pval = T,
risk.table = T,
risk.table.col = 'strata',
legend.labs = list ('0' = "Placebo", '1' = "Treatment" ),
risk.table.height = 0.25,
title = "Mortality Risk per Month and Treatment Group")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
## Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Interpretation ….. Treatment does not show
any significant improvement compared to placebo for mortality risk over
time. #### Question 11: Assess the effect of CVD on the
risk of mortality within each month and for each treatment group:
#
dig_new1.df%>%
filter(death == "Death") %>%
ggplot() +
geom_point(mapping = aes(x = cvd, y = death, colour = trtmt),
position = "jitter", alpha = 1)
labs(x = "Effect of CVD",
y = "Mortality")+
theme_minimal()
## NULL
Interpretation ….. Those with cardiovascular disease have moslty higher rate of both placebo and treatment participants than those without cvd.Both the groups with and without cvd show similar mortality rate. #### Question 12:
Assess if there is any linear relationship between systolic and diastolic blood pressures? Is this relationship affected by the treatment the patients received or whether patients have hypertension or not?
## Insert your code here
ggplot(data = dig_new1.df,
mapping = aes(x = diabp, y = sysbp, colour = trtmt)) +
geom_point() +
geom_smooth(method = "lm", se = F)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 8 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_point()`).
Interpretation ….. The distribution of placebo and digoxin group is even distributed among those presenting with systolic blood pressure and diastolic blood pressure